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Abstract 

Clustering a graph, i.e., assigning its nodes to groups, is an important operation whose best known 
application is the discovery of communities in social networks. Graph clustering and community 
detection have traditionally focused on graphs without attributes, with the notable exception of edge 
weights. However, these models only provide a partial representation of real social systems, that are 
thus often described using node attributes, representing features of the actors, and edge attributes, 
representing different kinds of relationships among them. We refer to these models as attributed 
graphs. Consequently, existing graph clustering methods have been recently extended to deal with 
node and edge attributes. This article is a literature survey on this topic, organizing and presenting 
recent research results in a uniform way, characterizing the main existing clustering methods and 
highlighting their conceptual differences. We also cover the important topic of clustering evaluation 
and identify current open problems. 
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1 Introduction 


Graphs represent one of the main models to study human relationships. For example, struc¬ 
tural properties of social systems can be measured by representing individuals and their re¬ 
lationships as graphs and computing the centrality or prestige of their nodes ( Wasserman & Faust, 
19941) . Similarly, once a social graph is available, groups of strongly connected individuals 
(communities) can be identified using clustering algorithms. The application of graphs 
to the study of social systems motivated and is now a part of a broader discipline called 
network science, focused on the modeling and analysis of relationships between generic 
entities. This discipline provides a set of tools (methodologies, methods and measures) 
to improve our understanding of complex systems, including social and technological 
environments, transport and communication networks and biological systems. The wide 
applicability of network science largely relies on the adoption of graph-based models, that 
thanks to their generality can be applied to a diverse range of scenarios. 

However, researchers in social network analysis (SNA) and social sciences have long 
been aware of the potential value in representing additional information on top of the 
social graph, and of the potential loss in accuracy when simple nodes and edges are used 
to represent complex social interactions. For example, according to Wasserman & Faust 
(1994) social networks contain at least three different dimensions: a structural dimension 
corresponding to the social graph, e.g. actors and their relationships, a compositional di¬ 
mension describing the actors, e.g. their personal information, and an affiliation dimension 
indicating group memberships. The existence of multiple relationship types, e.g., working 
together , being friends or e xcha nging text messages, has also been studied for a long 
time, as recently reported by Borgatti ef ai (2009). This last aspect has been referred to as 
multiplexity in the SNA tradition, and can be related to Goffman’s concept of context, well 
exemplified by the metaphore of individuals acting on multiple stages depending on their 
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audience (Goffman, 1974)- As an example, Figure [T(bj| highlights how an attributed graph 
may lead to a deeper understanding of social interactions if compared to the corresponding 
graph without attributes in Figure [T(a)| 


Age: 32 Age: 32 Age: 35 
City: Paris City: Paris City: London 

A friend J friend ■ 

\ work 

'"'--...work...---'' 

(a) (b) 

Fig. 1. A graph |(a)| provides a simplified representation of a social system which can 
be easy to understand but may prevent a deep understanding of its structural and 
compositional dimensions |(b)| 



1.1 Current trends in attributed graph analysis and mining 


Attributed graphs have been used for decades to study social environments and it has been 
long recognized that the structure of a_ social network may not be sufficient to identify 
its communities ( Freeman , 1996k Hric et al. . 2014 ). However, recent years have witnessed 
a renewed attention towards these models, partially motivated by the availability of real 
data from on-line sources. One interesting aspect of real attributed graphs is the observed 
dependency between who the actors are and how they interact, i.e. between the structural 
and compositional dimensions. For example. La Fond & Neville ( 2Q10t) have observed the 
coexistence of social influence and homophily. Social influence states that people who are 
linked are likely to have similar attributes, thus node attribute values can be interpreted as 
a result of interactions with other nodes. At the same time, homophily implies that people 
with similar attributes are likely to build relationships. These two related phenomena have 
been observed in real networks bv lKossinets & Wattsl (120061) . and the dependency be tween 
attributes and connectivity has been studied mathematically (Kim & Leskovec, 2012). 

With this in mind, researchers have focused on attributed graph generators. Artificially 
grown graphs are useful to experiment algorithms and run simulations when real data are 
difficult to collect. They are relevant in testing what if scenarios, providing forecasts on 
future evolutions, and can be used to design graph sampling algorithms when the size of 
original graphs would otherwise make the analysis impractical (Leskovec et al, 2005). 

Prior models, as the well-known preferential attachment mechanism by Barabasi & Albert 
(1999), have focused on the social structure. Now the challenge is to generate datasets 
as close as possible to real-world social graphs, as done by Zheleva et al. (2009) where 
affiliation information is also generated. This model captures previously studied properties 
(e.g. power-law distribution for social degree) but also provides new interesting insights 
regarding the processes behind group formation. More recently Gong et all (2011) have 
proposed a generative social-attribute network model based on their empirical observations 
of Google+ growth. Here attri butes describe user c h aracteristics like name o f attended 
school and group membership. iNan Du et al. (12010ft : iMagnani & Rossil (1201 3ah have in¬ 
stead focused on the generation of graphs with interdependent attributes on the edges. 
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The idea that attributes and connections are generated in an interdependent way has led 
to the development of specialized analysis methods. Several graph mining t asks have been 


2011 


2011 


Gong et ql 


2011 


Yang cl al 


Sun et a/.U20121) or attribute inference (ILi & Yeun g. 2009; Gong et al. 


extended to attributed graphs, like link p rediction ( Getoor & Diehl , 2005^ 


Rossetti et al 


2011). This survey is dedicated to one of the most relevant and studied 


operations on graphs and complex networks: graph clustering, often referred to as com¬ 
munity detection when social graphs are involved. We believe that this is an important and 
timely effort to facilitate research in this still young area, in particular considering that the 
discussed approaches have been introduced in different disciplines, often unaware of each 
other. 


1.2 Clustering attributed graphs 
Although several surveys on graph clustering have bee n written ( Schaeffer , 2007 ; Fortunatol 


2010t lAggarwal & Wand. 1201 Ot ICoscia et al. 1201 lh . most of the approaches to cluster 


attributed graphs are more recent and have not been included in these works. At the same 


time, there is a large literature on (multi-dimensional) clustering of tabular data ( Moise et al. 


20091 : IHan et all 2011), but existing surveys in this area have not addressed extensions for 


graph data. Attributed graph clustering can be seen as the confluence of these two fields, 
the former focusing on the structural and the latter on the compositional aspects. In this 
article we focus on recent works resulting from this promising combination. 

The article is organized in three main parts: a review of methods for edge-attributed 
graphs, a review of methods for node-attributed graphs, and a section on practical issues 
including the evaluation of clusterings and the applicability of different approaches. We 
conclude by summarizing the status of the research and discussing the open problems that 
are more promising according to our view of the area. Attributed graph clustering has 
been independently studied in different disciplines, therefore it is important to know how 
different terms have been used in the literature. In Table Q] we have indicated and briefly 
explained the terms used in this article. 


2 Clustering edge-attributed graphs 

One way to extend a graph model and to provide additional information to the clustering 
algorithm is to represent the different kinds of edges among individuals. As an example, in 
Figure [Hb) we can see that the relationship between the two left-most nodes consists of a 
friendship and a working edge. 


friend ,/ friend 
\ work 

work 

(a) (b) 

Fig. 2. Two alternative representations of the different edge types in a multigraph 
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work 


Different models have been used to represent this scenario 

(Minor, 1983; Lazega & Pattison. 
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Table 1. Terminology used in this article and synonyms used in the literature 

main term synonyms meaning 


Basic component of a graph. As an 
example, a node may indicate that a 
Node Vertex, site, actor user has an account on the social media 

site whose social network is represented 
by that graph. 


Edge 


Link, arc, tie, connection, bond, 
relation(ship) 


A relationship between two nodes, e.g., 
a following relationship between two 
Twitter accounts. When there is an edge 
between two nodes we say that they are 
directly connected. 


Graph 


Network, social network, layer 


A graph without attributes, neither on 
nodes nor on edges, with the exception 
of an optional numerical weight on 
edges indicating the strength of the 
connection. Edges may be directed or 
indirected. 


Attributes indicate connections of dif- 
Multiplex network, multi-layer ferent kinds or inside different graphs. 
Edge-attributed graph graph, multidimensional net- With this term we do not indicate the 
work, edge-labeled multi-graph presence of weights, in which case we 

explicitly talk of weighted graph/edges. 


Node-attributed graph 


Node-labeled graph, graph with 
feature vectors 


A feature vector is associated with each 
node and contains information about it, 
e.g., age, nationality, language, income. 


Attributed graph 


Attribute graph, social and affil¬ 
iation network, relational data, 
multidimensional network 


An edge-attributed graph, or a node- 
attributed graph, or both. 


Sometimes all the edges with the same 
attribute value in an edge-attributed 
graph are indicated as a layer, e.g., the 
Layer Aspect, dimension Facebook friendship, spacial proximity, 

Twitter following, colleague or family 
layers in an attributed graph indicating 
different types of social relationships. 


Assignment of each node to one or 
more groups of nodes, called clusters. 

Clustering Community structure Different criteria can be used to 

determine whether two nodes should 
belong to the same cluster. 


Partition 


Non-overlapping clustering 


A clustering where each node is 
assigned to exactly one cluster. 
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times e mphasizing the differen t roles played by individuals with resp ect to different n et- 
works (IMagnani & Rossil 1201 11) . including different kinds of nodes (ICai et all 120051) or 
providing a more general datamodel to mathematically represent a graph with attributes 
on both nodes and edges (IKivela et all 2014). In Figure [2] we can see two alternative 
representations of the same data, as a multigraph (a) and as a set of interconnected graphs 
(b). The former, sometimes referred to as a multiplex network, focuses on a single set of 
nodes that may have complex relationships between them: 

Definition 1 (Multi-relational edge-attributed graph ) 

Given a set of nodes N and a set of labels L, an edge-attributed graph is a triple {G = 

( V,E,l )} where V C /V, (V,E) is a multi-graph and /:£’—> L. Each edge e £ E in the graph 
has an associated label 1(e). 

The latter emphasizes how the same node can belong to multiple (social) graphs, also 
known as layers: 

Definition 2 (Multi-layer edge-attributed graph ) 

Given a set of nodes N and a set of labels L, an edge-attributed graph is defined as a set of 
graphs Gi = (Vj,Ej) where V; C N, Ej C V 7 , x V). Each graph Gj has an associated unique 
name /, £ L. 

Although very similar, and in this specific example equivalent, these two representations 
emphasize different aspects of an edge-attributed graph. It is important to understand that 
the methods covered in the remaining of this section have been developed starting from 
specific models, influencing their features. Researchers using the first model have mainly 
focused on the reduction of different edge types to single edges, while researchers using 
the second model have looked for clusters spanning different layers and nodes belonging to 
multiple clusters depending on the edge type. With this difference in mind, in the following 
we will formally represent both scenarios using the second (more general) model, where a 
family of graphs possibly containing common nodes represent the different kinds of edges. 
A larger working example is shown in Figure Qfa)1 

More general definitions have been provided in the literature, where one node in one 
graph can correspond to multiple nodes in another. This includes the case of online social 
media, where the same user can open multiple accounts on some services (IMagnani & Rossi 
2011), and the case of non-social networks containing different kinds of nodes, such as a 
power grid and a control network, where one node in a network can be related to multiple 
nodes in another (iGao et a/.ll20111) . Similarly, the model introduced bv lKivela et al\ (120141) 
allows the presence of attributes both on nodes and edges. For the sake of simplicity we 
focus on the simpler definitions above, because they are the ones used by almost all works 
on clustering social networks to date. Also, notice that we focus on nominal attributes, e.g. 
work and friendship: the case where attributes are only numeric, that is, weighted graphs, 
has already been treated in depth in existing surveys. However, we will deal with numeric 
weights when these are used inside algorithms for nominal attributes. 


2.1 Single-layer approaches 

A basic approach to deal with edge-attributed graphs is to flatten them: to reconstruct a 
single weighted graph so that existing clustering methods can be indirectly applied. This 
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Fig. 3. An edge-attributed graph, corresponding to a set on interconnected graphs defined 
on a common superset of individuals |(a)| An indirect way to process it is to reduce it 
to a single weighted graph, then apply classical clustering algorithms |(b)| A significantly 
different approach is to look at exclusive connections [(c)] 


approach, exemplified in Figure |3 |fb)| is not restricted to clustering but can be applied to 
any operation defined on weighted graphs. Weights can be computed straightforwardly so 
that an edge between two nodes has a weight proportional to the number of graphs where 
the two nodes are directly connected. 


Definition 3 ( Flattening) 

A flattening of an edge-attributed graph ({G,}) is a weighted graph ( Ef,Vf,Wf ) where 
Ef = (J£,-, Vf = |J Vi and w(u, v) = F I ( w h ere n j s the total number of graphs). 


Berlingerio et al. ( 2011 ah follows this approach. However, the same authors point out how 


this solution may discard relevant information, e.g., the fact that some attribute values 
(or graph layers) are more important than others to define a cluster. Tang et al. (2011) 
propose a more general framework where the information about the multiple edge types 
is considered during one of the four different components of the community detection 
process, network flattening being one of them. Nevertheless, the authors point out that 
this kind of integration requires that edges of different types share the same community 
structure. Therefore, it is not suitable for cases where the structures significantly vary in 
different dimensions. 

An antithetic approach acknowledging the importance of edge-attributed models but still 
not considering clusters that can span several graphs is introduced by Bonchi et al_. (2012). 
While flattening tends to assign nodes directly connected on multiple graphs to the same 
group because they get connected by a strong edge in the flattened graph, [Bonchi et al 


(2012) consider a set of nodes as a good cluster if their relationships are as specific and 
homogeneous as possible, i.e., they are mainly connected through the same edge type. An 
example is presented in Figure Q]~c)1 where the three nodes marked in black are connected 
with each other in the middle layer but only share one single edge on all other layers, 
representing a good cluster according to this approacfQ. 


1 Please notice that this specific example is not compatible with the original model bv lBonchi et al .I 
120121) where individuals are allowed to be directly connected only on one of the layers. However, 
it retains its underlying intuition. While this work was not originally intended to be applied to this 
domain, it still presents a worth-mentioning alternative point of view. 
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The next sections are devoted to methods aiming at identifying clusters spanning mul¬ 
tiple layers. They are mostly extensions of quality measures traditionally used in graph 
clustering, modularity and quasi-cliques being two prominent examples. 


2.2 Extension of modularity 


Modularity is a measure of how well the nodes in a graph can be separated into dense 
and independent components (Newman & Girvani 2004). Figure@]shows four graphs with 
their nodes assigned into two communities (black and white) and the modularities resulting 
from these assignments. In these examples it clearly appears how the assignments putting 
together highly interconnected nodes and separating groups of nodes with only a few con¬ 
nections between them get a higher value of modularity. It is worth noticing that modularity 
is not a method to find communities, but only a quality function. However, it can be directly 
optimized or used inside community detection methods to guide the clustering process. 
Alt hough this measure suffers from s ome well known pitfalls (iFortunato & Barthelemv . 


2QQ7; lLancichinetti & Fortunato. 1201 lh . it has recently been at the basis of several graph 
clustering methods and it has also been extended to deal with attributed graphs. Let us 
briefly introduce id, to later simplify the explanation of its extension. The modularity is 
thus expressed as 


Q 2m ^ 




(1) 


where <5(y, y.) is the Kronecker delta which returns 1 when nodes i and j belong to the 
same cluster, 0 otherwise. Therefore, the sum is computed only for those pairs of nodes 
that are inside the same cluster. For each of these pairs, the presence of an edge between 
them improves the quality of the assignment: a (/ - equals 1 when there is an edge between 
i and j, 0 otherwise. As we are dividing everything by m (the number of edges in the 
graph), edges between nodes belonging to different clusters negatively affect modularity 
because they are not considered in the numerator (as <5(y,, yf) = 0), but are counted in the 
denominator (m). Finally, the formula considers the fact that two nodes with high degree 
would be more likely to end up in the same cluster by chance, therefore their contribution 
is reduced (- -ff, where k, and kj are the degrees of i and j). _ 


Now it should be easier to understand the extension of modularity proposed by Mucha et al. 


(2010) for edge-attributed graphs. Let us consider Figure[5] here we have emphasized how 
the same individual i can be present in multiple graphs at the same time. For example, i 
and j are directly connected on graphs r and s, where r and ,v represent two different edge 
types. Notice that in this example we have three graphs, i.e., three edge types, and that j is 
assigned to two different clusters in graphs r (gray) and s,t (white). 


2 Please notice that modifications of this formula have been proposed to make it more adaptable to 
different datasets. One typical addition is a resolution parameter , that we have omitted from the 
following equations because it is orthogonal to our discussion. 
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L 

Q = -0.24 


Q = .28 

Fig. 4. Modularity of four graph clusterings: nodes in each graph are assigned to two 
clusters (black and white); the modularity of each assignment is reported under the graph 



Q = 0 





Fig. 5. An edge-attributed graph with three kinds of edges, represented as three 
interconnected graphs. Nodes have been assigned to three clusters (black, gray and white) 


Thus, the extended version of the modularity can be expressed as 


Qm 



kisk js 

2m s 


S(s,r)+c jsr 8(i,j) 


8{Yi, s ,Yj, r ). 


( 2 ) 


This extended quality function involves not just all pairs of nodes (i.j) but also all 
pairs of graphs ( s,r ). p and Sfy^.y,-.,-) correspond respectively to m and 5(y,y,) in the 
modularity formula, where p also considers the connections between different graphs: 
we say that there is a connection between two graphs r and s whenever they contain a 
common node j, which increases p by Cj sr . 5(y jS , y 7 - r ) allows to assign the same node to 
different clusters inside different graphs. The sum is now made of two components. One 
is only computed when two nodes in the same graph are considered (because of 8{s,r )), 
corresponding to modularity. In fact, here «, ;v = 1 when i and j are directly connected in 
graph s and k !S is the degree of node i in the same graph. The second component, c ;sr , is 
only computed when we are considering the same node j inside two different graphs r and 
s. This term increases the quality function by Cj sr (typically, a constant value ranging from 
0 to 1) whenever we assign the same individual to the same cluster on different graphs. 

One practical problem in using this measure is to set the Cj sr parameter. Setting it to 
0 for all nodes and graphs, clusters are identified on each single graph independently 
of each other. If Cj sr is high, e.g., 1, it becomes unlikely to assign the same individuals 
to different clusters on different graphs. Other practical aspects to consider are the fact 
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that the part of the formula corresponding to traditional modularity can give a negative 
contribution, which is not true for the part taking care of inter-network relationships, and 
also the fact that the contribution of inter-network relationships grows quadratically on the 
number of networks while the modularity part only grows linearly. However, while the 
choice of appropriate parameters deserves more research, this extended definition of mod¬ 
ularity can be directly used to find clusters by using any modularity-optimization heuris¬ 
tics, as done by Mucha et al. ( 2QlC)h . or paired with a concept of betweenness to extend 
the Girvan-Newman algorithm. The definition of betweenness for edge-attributed graphs 


follows directly from any d efinition of distance involving multiple graphs (iBrodka et al. 
2011 : Magnani et al. . 2013 ). 


Figure [6] shows the values of modularity for four different multi-graphs and three dif¬ 
ferent settings for the inter-graph parameter Cj sr (which is kept constant for all nodes 
and graphs). The figure emphasizes the different components of this measure. On the top 
we can see two clusterings aligned with both the single-graph and multi-graph structure. 
In particular, groups of nodes sharing several edges belong to the same cluster, and the 
same nodes on different graphs tend to belong to the same cluster. However, the top-right 
example shows that we can assign a node to different clusters in different graphs. 

Modularities computed using different values of Cj sr cannot be compared: increasing 
Cj Sr also increases the absolute value of modularity. However, we can see how the increase 
in the top-right figure is proportionally lower than the one on the left (from .48 to .68 and 
from .54 to .62, respectively). This is determined by the nodes assigned to multiple clusters. 

The two lower figures show examples of lower modularity, i.e., clusterings not following 
the structure of the graphs. The lower-left image has a low overall intra-graph modularity 
which can be seen when cj sr = 0 and thus inter-graph connections are not considered. 
When we also consider them ( Cj sr = .5 and Cj sr = 1) we can see that modularity is in¬ 
creasing in the lower-left graph much more than in the lower-right one, where every node 
belongs to both clusters on different layers. 


2.3 Clique-finding methods 

Another concept used to discover clusters in graphs is the clique, i.e., a complete (sub)graph. 
Although this is one of the basic concepts in graph theory and it is thus well known, we 
briefly recall it. 

Definition 4 ( Clique ) 

A clique is a set of nodes directly connected to all other nodes in the clique. 

Definition 5 (Maximal clique) 

A maximal clique is a clique that is not contained in a larger clique. 

Figure |7(a)| shows an example of a clique. Any three nodes in Figure |7(a)| still make a 
clique, but not a maximal one because we can add the fourth node and still have a clique. 

A (maximal) clique clearly corresponds to a cluster. However, large cliques are difficult 
to find in real data because it is sufficient for one edge not to be present to break the clique, 
and in social graphs edges can be missing for many reasons, e.g., because of unreported 
data or just because even in a tight group there can be two individuals that do not get well 
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Fig. 6 . Multi-layer modularity of four graph clusterings: nodes in each graph are assigned 
to two clusters (black and gray); the modularity of each assignment is reported under the 
graph using three settings: Cj sr = 0, Cj sr = -5 and Cj sr = 1 


together. Therefore, when clustering is applied to social graphs, it is wiser to look for more 
relaxed structures called quasi-cliques. 

For example. Freeman (1996) studies the cliques gathered from interviews to a group of 
individuals and acknowledges that they are not enough for defining communities. 


Definition 6 ( Quasi-clique ) 

A quasi-clique is a set of nodes where each node is directly connected to at least 7 % of the 
other nodes in the quasi-clique. 


Algorithms to discover quasi-cliques take 7 as a parameter. Please notice that similar 
alternative definitions are possible, e.g., using a strict > or considering the percentage over 
all nodes in the quasi-clique — the underlying concept remains the same. In Figure |7(b)l 
we have illustrated a .5-quasi-clique, and in Figure [7(c)] we have four nodes that do not 
constitute a .5-quasi-clique because the white node is directly connected to only one third 
of the other nodes. 


/J<\ /_S\ 

(a) (b) (c) 

Fig. 7. A clique[(a)l a quasi-clique [(b)] and four nodes not making a ,5-quasi-clique[(c)] 
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The problem of finding quasi-cliques in a graph is NP-hard. According to common 
beliefs, this implies that no algorithm can exactly solve this problem in a reasonable amount 
of time even for small graphs. However, efficient algorithms which do not guarantee the 
identification of all quasi-cliques have been proposed. 

As previously mentioned, the most common interpretation of clusters in edge-attributed 
graphs states that multiple kinds of edges between two individuals strengthen their rela¬ 


tionship. Therefore, Pei et al 


in all graphs and Wang et al. 


(2005) have introduced algor ithms to discover quasi-cliques 
(2006): IZhiping Zend (120061) to identify quasi-cliques in at 


least a given percentage of graphs (where this threshold is called support). 


While not based on quasi-cliques, the ABACUS algorithm by Berlingerio et al. (2013) 
also applies a similar definition, coming from the frequent itemset mining problem. First, 
clusters are identified in each graph, then those individuals being in the same cluster in at 
least a given percentage of graphs are also included into a global cluster in the final result. 

It is worth noticing that quasi-clique clustering methods were first developed for generic 
graph databases without focusing on the application domain of social graphs. In this spe¬ 
cific domain, while we may agree that a cluster spanning all the graphs represents a 
strong global cluster, a group of nodes sharing a large number of edges on a few specific 
graphs may also identify a cluster of interest. For example, we might find that a group of 
individuals goes to the same school and plays in the same basketball team. This is a strong 
relationship that should not be negatively affected by the existence of other relationships 
where they do not form a group. However, adding other edge types to the attributed graph 
(which corresponds to adding new graphs to the multi-layer graph structure) would reduce 
their support. 


The approach proposed by Boden et al. (2012) starts from this consideration and looks 
for sets of nodes that make a cluster in each single graph of any subset of the graphs in 
an edge-attributed model. This work also considers the case of weighted graphs, but this is 
peculiar to this method and we will not provide additional details here. 


2.4 Emerging clusters 

We conclude this section presenting a hypothesis still unverified in the literature that in our 
opinion might lead to the development of new clustering methods. The hypothesis is that 
clusters can emerge when a specific combination of graphs is considered, and disappear 
when more graphs are added to the model. 

In Figure H] the idea is illustrated on a simple example. The analysis of the three graphs 
together (right hand side of the figure) does not reveal any interesting patterns as there are 
too many edges in the graph. The same can be observed for each single graph (on the left). 
However, choosing two specific layers, some more evident clusters emerge (center, clusters 
denoted by black and white nodes). None of the previously presented approaches seems to 
be able to find such clusterings, because they require every cluster to be present in at least 
one of the single graphs. 

This hypothesis would also provide an answer to the difficulty in finding good cluster¬ 
ings in real social graphs. In fact, although several clustering algorithms exist, in practice 
they achieve good results when some more or less well-separated clusters exist. This is 
strictly related to the way in which community detection algorithms have been defined: 
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Fig. 8. Emerging clusters: well separated clusters appear when a specific subset of the 
graphs is used, but disappear when less or more networks are added 


some try to maximize modularity, favoring well separated clusters, some use random walk 
approaches, where the probability that a walker crosses two clusters is proportional to the 
number of edges between them, some exploit measures like betweenness, that is high when 
few other edges connect distinct portions of the graph ( Fortunato , 2Qlo| ). However, when 
we deal with on-line relationships, clustering becomes extremely hard. According to our 
hypothesis, this depends on the fact that a large number of semantically different layers are 
considered all-together, determining the co-existence of several overlapping clusters, and 
a case of information overload. 

In summary, if we consider Figure [8] (right side), we would not expect any clustering 
algorithm to find evident clusters. However, in theory clusters may appear when the multi¬ 
layer organization of the edges is unfolded in specific ways, e.g., by only retaining the two 
layers in Figure [8] (center). Therefore, the problem shifts from being purely algorithmic 
(e.g., how do we find the best cut?) toward aspects like the choice of the data model, data 
preprocessing and feature selection. 

A preliminary work in this direction that can be seen as a conjunction between the idea 
of emerging clusters and the flattening approach is discussed by Rocklin & Pinar (2011). 
This work proposes an algorithm to find a vector that weights the layers to aggregate them 
such that the clustering of the resulting flattened graph is as similar to a given ground-truth 
clustering as possible (the clustering algorithm and a similarity measure between weighted 
single-layer graphs are given for this problem). The second half of the paper deals with 
the rich clustering structure that the multi-typed edges can provide. Generating random 
aggregates of the graph, the authors explore the space of possible clusterings and study, 
e.g., if good graph clusterings are clustered in this space. The final problem that they tackle 
is how to give an efficient representation of this resulting meta-clustering. Their approach 
is to reduce each meta-cluster (of clusterings) into a single representative clustering and 
select a small number of them to cover the meta-clustering space. In this way, they provide 
a set of diverse and non-redundant clusterings as output. 


3 Clustering node-attributed graphs 


According to the taxonomy presented by Getoor & Diehl (2005), node-attributed graph 
clustering aims at detecting groups of nodes sharing common characteristics considering 
both their attributes and their position in the graph. Most of the works addressing this 
problem are based on partitioning and homophily: nodes can belong to one and only one 
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group, and nodes in the same group must have homogeneous values on their attributes. A 
few other methods, also covered here, generate overlapping clusters, e.g., by considering 
different combinations of the attributes. This last approach is usually known as subspace 
clustering. 


3.1 Data representation 


Like in the case of edge attributes, also when attributes on nodes are considered, the 
literature abounds with terminologies and models depending on the research field or the 
finality of the work, making it difficult to provide a unified view. However, we can see 
some main options emerging. 


As previously mentioned. Wasse rman & Faust (1 994 ) describe multiple dimensions that 
can be represented in a social network model: a structural dimension (relationships among 
actors), a compositional dimension (attributes of the single actors), and an affiliation di¬ 
mension (representing group memberships). Affiliation information often refers to known 
groups such as clubs or companies, but it can also represent the cluster memberships 
discovered through a clustering process. 

Two main options to represent such a model are shown in Figure [9] The first one, 
Figure |9(a)l consists in extending a structural graph with tuples describing node properties. 
This can be formally expressed as a triple G = (V,E,F) where each node v is associated 
with a set of a attributes (or a feature vector ) [/t (v), ...f a (v)], storing its compositional 
dimension. Note here that the affiliation information may be stored in the same way, by 
adding attributes dedicated to memberships. The second option, Figure |9(b)l consists in su¬ 
perimposing one or more graphs where additional nodes represent either specific attribute 
values or groups. Structurally, this superimposed graph is bipartite because it connects 
individuals to groups, without edges between groups or between users (the latter are stored 
in the original social network). More formally, a graph G p = ( V p ,E p ) is augmented by a 
bipartite graph G a = (V p UV a ,E a ), connecting nodes of V p to attribute nodes of V a , with no 
links between attributes: E a C V p x V a . This defines an augmented graph G = (V.E) with 
E = E p U E a and V = V p U V a . 

Several terms have been used in the literature to refer to the options presented in Figures 
|9(a)| and [9(b)| or even for their intermediate variations. To make access to the existing liter¬ 
ature easier, in Table|2]we report the main terms together with the references to where they 
appear and the indication of which modeling option has been adopted. Our objective here 
is not to be exhaustive: we aim at capturing the relationships between different approaches. 
For example when Tong et al. (2007) refer to an attribute graph, they imply that they have 
previously grouped the nodes with common attributes, and propose a meta-graph where 
meta-nodes reflect those groups and edge weights represent group-to-group similarity. 


Zheleva et al. (2009) study social and affiliation networks keeping two distinct graphs 


and observing the co-evolution of these two graphs via their common nodes, retrieved 
from Flickr groups. In the machine learning field, in the late 1990s and early 2000s, 


workshops dedicated to link mining refer red to relat i onal d ata (INeville et al. 120031) . In 


a more recent data warehousing context, Zhao et al. (2011) introduced an OLAP graph 
cube for multidimensional networks. 
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30-39 50-59 Stratford Rennes 

20-29 40-49 London Brest Paris 



(b) 


Fig. 9. |(a)| Attributes represented as tuples describe node properties. The similar¬ 
ity/distance between tuples can be integrated into the graph and used during the clustering 
process. |(b)1 New nodes representing the additional information are added to the original 
graph, resulting in a heterogeneous structure with multiple node types. 


Table 2. Some terminology used in the literature to refer to node-attributed graphs 


term references option 


Social-attribute network (Yin et a l l l2010aH) (b) 


Attribute augmented graph l lZhou el a/.Ll2009l .[20 1 (1 1 (b) 


Attributed graph 


IZhou et al.[ 12009): |Cruz et u/.L 1201 3l : 
ICruz & Bothorel . 2013 1 


(a) 


Feature-vector graph 
Vertex-labeled graph 


iGiinnemann et ai. I l20l3l) 


(a) 


In summary, there has not been a consensus on the model yet. While different formats are 
useful to emphasize different aspects, all models include both structural and compositional 
data and one can be derived from another. Therefore, to introduce existing methods, we 
will use a common model consisting of an attributed graph G = (V 1 E 1 F) where nodes are 
associated with an attribute vector F(v). 


3.2 Weight modification according to node attributes 

The first class of methods we present is based on the following idea: first the node-attributed 
graph is reduced to a single weighted graph, where weights represent attribute similarity. 
Then, any clustering algorithm for weighted graphs can be applied in principle. Different 
methods use alternative functions to compute node similarity and to update edge weights 
when similarities have been computed. However, in all these approaches the change of 
weights influences the clustering algorithm to privilege the creation of groups in which the 
nodes are not only well connected but also similar. 
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(a) 



Fig. 10. A node-attributed graph (a) and an attribute-free representation of the same graph 
(b) where attribute similarities are stored in the edge weights (b). Thicker edges indicate a 
higher weight, i.e., a stronger connection 


Table 3. Variations of the weight modification approach 


reference 


similarity 


clustering 


Karger’s Min-Cut 

l lNeville et a/.ll2003l) Matching coefficient MajorClust 

Spectral 


Assign u and v to the same 

Isteinhaeuser & ChawlaL [20081) Extended matching coefficient cluster when the weight of (u,v) 

is above a given threshold 


Cruz et al. 

201 lbh 

Cruz et al. 

201J 


Self-organizing maps Louvain 


As an example, consider FigureHO] Focusing solely on the attributes, nodes {1,2,3,4,7} 
would form a homogeneous cluster, well separated from nodes {5,6}. If we only consider 
the structure of the graph, two clear clusters emerge (nodes {1,2,3} and nodes {4,5,6,7}). 
These two pieces of information are summarized in the weighed graph in (b). While the 
specific final clusters depend on the assigned weights, we can see the emergence of a cluster 
made of nodes {1,2,3,4}, presenting both structural and compositional similarities and 
otherwise difficult to identify. Table [3 summarizes the main works adopting this strategy, 
and the measures mentioned in the table are reported in the following. 

For example, Neville et al. ( 200 31 use the matching coefficient similarity metric .S',-/ 
quantifying the number of attribute values ( k ) the nodes have in common. This similarity 
metric is expressed as 


Sij = 


Lk s k {i, j ) if eij eE ore J( G E 

0 otherwise 


(3) 
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where 


Sk(i,j) = 


1 if ki = kj 
0 otherwise 


Once the weights have been changed, t he graph is clu stered using o ne of the three meth¬ 
ods re ported in Table[3] Karger’s Min-Cut (Karger, 1993), MajorClust (Stein & Niggemann, 


1999) or spectral clustering with a normalized cut objective function (Shi & Malik, 


>ood) . 


Experimenting with artificial datasets, spectral clustering appears to be robust to irrelevant 
attributes and graphs with low linkage. 


Steinhaeuser & Chawla (2008) extend the matching coefficient computation to take both 


discrete and continuous attributes into account: for discrete attributes, each common at¬ 
tribute shared by two nodes increments the weight of e {u, v) by 1; for continuous attributes, 
the idea is to add the normalized distance between the attributes. Once the weights have 
been changed and normalized, all nodes, connected by an edge whose weight is greater 
than a threshold t, are assigned to the same cluster. In thi s spec ific work the quality of the 
final partition is evaluated usin g modularit y (Newma n & Girvanl 120041) . 

The approach presented by Cruz et al. (2011b, 2012) deals with the fact that not all 
attributes may be relevant to determine the similarity between nodes. When too many 
attributes are involved in the computation of traditional distance functions, e.g. Euclidean 
distance, we lose the ability to discriminate between different nodes. In fact, the so-called 
curse of dimensionality materializes in that all distances tend to converge to the same value. 
In addition, some attributes may need to be combined/transformed to become relevant. 
Therefore, the authors use a classical machine learning approach developed by Kohonen 
( 1997 ) and known as self-organizing map (SOMjH, to find the latent information worth 
to establish the similarity between the nodes. An edge between two nodes from the same 
cluster gets its weight strengthened proportionally to a given constant a 1 . The resulting 


weighted graph is finally clustered using the Louvain method (IBlondel et al. 2008) and 
the overall complexity is linear Cf{n) + &(fn) + 0(rn), where n is the number of nodes, / 
the number of attributes or features and m the number of edges. Additionally, the authors 
introduce the notion of point of view, by manually selecting subsets of attributes, it becomes 
possible to analyze the social network from different perspectives. 

It is worth noticing that this family of techniques produces new edge weights according 
to node attributes. If the original social graph is also weighted the two kinds of weights 
must be combined is some way, e.g., by multiplying them. 


3.3 Linear combination of attributes and structural dimensions 

The previous family of methods removes node attributes by storing their information inside 
the edges of the graph. Some studies adopt an opposite approach consisting in the removal 
of the network: structural information is stored into a similarity (or a distance) function 
between nodes. After defining this function, classic distance-based clustering methods can 
be applied. As an example, [Combe et al. (2012) define a distance between nodes which is 


3 


Self-organizing maps have been proposed as a learning approach that is robust to noise and can 
map high dimensional data into low dimensionality spaces, e.g. text. 
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Table 4. Similarity or distance functions combining structural and compositional 
dimensions 


reference 

similarity or distance 

(Combe et al. 2012) 

a-d T {i,j) + (l-a)d s {i,j) 

(Villa-Vialaneix et al.. 20IT) 

OqKq (i, j) + Y.d Ud^d (cf > cj ) 

(Dana & Viennet, 2012) 

a ■ Gij + (1 — a) ■ simA ( i , j) 


given by 


drs (f,;') = “• d r (b j) + (1 - a) d s (b j), 


(4) 


where dr (i,j) and ds ( i,j ) are the attribute and structural similarity, respectively, between 
nodes i and j and 0 < a < 1 is a weighting factor. The authors leave the choice of the 
clustering method open. Another similar distance function by Dang & Viennet (2012), as 
listed in Table [4] is used to build a k-neare st neighbor graph in order to find clusters using 
the Louvain method dBlondel et al.. 2008). 

The main feature of these approaches is that nodes which are structurally far from 
each other in the social graph can result to be close in case of similar attribute values. 
As a consequence, and depending on the distance-based clustering method, clusters may 
contain disconnected portions of the graph.[Hanisch et al. (2002) experiment with a similar 
approach on biological networks and gene expression data. After the computation of the 
combined distance, they apply hierarchical clustering and a statistical measure to define 
the cutting point of the dendrog ram. 

While Villa-Vialaneix et al. (20139 share a similar purpose using a weighting parameter 
to balance their components, they rely on kernels to map the original (multi-space) data into 
an (implicit and unique) Euclidean space where SOMs can be used. In this case authors 
define a multi-kernel similarity function to combine composition and structure as indicated 
in TableQ] Kq (i,j) indicates the kernel measuring structural similarity, c'j is the c/th label 
of node i and a c / are weighting factors. 

This approach also exploits the visual potential of SOMs which can be represented as 
bi-dimensional grids. In such grids, each cell represents a group of nodes, and the size of 
the cells is proportional to the number of observations associated with it. In this way the 
authors are able to represent the size of the communities, the distribution of topics and the 
links on the same 2-dimensional representation. 


Dang & Viennet (2012) propose an extension of the Louvain method with a modification 


of modularity to include the similarity of the attributes in the community discovery process. 
This is given by 

Q=Yj L (“' S (*>■/) + (!-«)■simA (5) 

Cedjec 


where C indicates the set of graph partitions, S ( i,j ) represents the strength between two 
nodes (computed as in the original definition of modularity), simA(iJ) is a similarity 
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function based on attributes i and j and can be adapted according to how the attributes 
are represented. 0 < a < 1 is a weighting factor. 

In general, for parametric methods an important question is how to choose a. According 
to the authors of these methods clusters are stable against small changes in the parameter. 
Dang & Vienned (2012) also propose a_way_to e stim ate a, and kernel-based approaches 
support automated parameter tuning (Villa-Vialaneix et al, 2013). Depending on applica¬ 
tion, analysts may also set a to emphasize attribute homophily or connectivity. However, 
more case studies and future independent analyses will be welcome. 


3.4 Walk-based approaches 

A random walk on a possibly infinite network is a stochastic process where a walker goes 
from node to node by choosing a target neighbor at random at each step (Noh & Rieger, 
2004). In the clustering context walk models are used to estimate vertex distances on 
attributed graphs. In accordance with this distance, k-means-like approaches attract close 
nodes around the predefined k centroids in order to aggregate the members of the commu¬ 
nities. 

Zhou et al. ( 20091) define a random walk process on graphs like the one in Figure [9(b)| 


The result is that the more attribute values two vertices share, the more paths via the 
common attribute nodes exist. In this way random walks can be used to measure vertex 
proximity through both the structural links and the com positi onal links. 

In the Connected k Centers method proposed by Ge et al. ( 20081 ) the walk strategy is 
a simple breadth-first search (BFS) defined for graphs like the one in Figure |9(a)[ where 
the feature vector is also used to determine the next visited node. This method implements 
the k-means algorithm using walks to compute distances: first, it picks k random nodes as 
cluster centers, second, all the nodes are assigned to one of the k clusters by traversing the 
graph using BFS; third the centroids of the clusters are recalculated. The second and third 
steps are repeated until there are no further changes in the clusters’ centroids. 


3.5 Methods based on statistical inference 

Statistical inference is the process of drawing properties of datasets from a set of observa¬ 
tions in a model and then inferring predictions about a larger population represented by the 
sample. In this section, and according to the classification provided by Fortunato ( 20id ), 
we focus on two types of methods: the ones using generative models, as an intermediary 
step or in a pure manner to mix attributes and links in a unified model, and the ones using 
stochastic block models. 

Many studies focus on the task of clustering networks of documents. Here, every doc¬ 
ument can be seen as a node characterized by_a complex attribute defined by the words 
contained in the document. For example, Li et al. (2008) propose a clustering method to 
find communities in a large-scale document corpus exploiting both the document content 
(the words), and their references/citations. They use statistical inference as an intermediate 
step to find hidden topics to further manipulate the documents. The general principle is to 
find community cores and then include their members. The detection of cores identifies the 
documents that are frequently co-referenced and may play the role of community seeds. 
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A second phase merges the initial cores according to their topic similarity in order to 
improve the core consistency. The authors use here the well-known text-mining method 
called Latent Dirichlet Allocation (LDA) to find topics. LDA is a generative topic model 
so that unobserved or latent topics have probabilities to generate various observed words. 
A Bayesian inference finds the best fit of the model to the observations through likelihood 
maximization. Finally, the third step is to affiliate the remaining documents to the clusters. 
This affiliation propagation process may lead to misclassified documents and a final step 
removes false hits. 


LDA is also used by iLiu et al\ (120091) and iBalasubramanvan & Cohenl (1201 1) but as a 
central approach and in an extended manner to identify latent groups. The Topic-Link LDA 
model defined bv lLiu et al. (2009) is a generative model considering topics, membership of 
authors and link formation between pairs of documents exhibiting both topic similarity and 
community closeness. The inference is designed to regularize the topic information when 
inferring the hidden communities and vice versa. The authors maximize likelihood using an 
expectation-maximization algorithm and demonstrate their unified model on three different 
tasks: topic modeling, community detection and link prediction in blogs and CiteSeer 
datasets. For the community detection task, we would highlight here an interesting remark. 
Their approach offers a meaningful investigation of how content similarity and community 
similarity contribute to the formation of links. They are able to reveal that author member¬ 
ship has a much stronger effect on link formation between blog posts in political domains 
than technical papers. They also show that the t opic dimension plays a more imp ortant 
role than the community similarity in blog citing. Balasubr amanvan & Cohenl (1201 ill also 
address the problem of link modeling and combine two popular methods: block modeling 
and LDA. 


X u et a/.l (2012) propose a community detection model that is transformed into a statisti¬ 


cal inference problem. Authors start by defining a generative Bayesian model that produces 
a sample of all the possible combinations of a graph, defined by its adjacency matrix X, a 
matrix of features Y and a vector Z containing the assignation of each node to one out of k 
groups, i.e., a partition of the graph. This model produces a conjoint probability p (X, Y, Z). 
The idea is thus to find a partition Z* such that Z* =arg z max/?(Z |X,Y). 

These techniques are very attractive to mix both attributes and topology into the same 
model, but unfortunately the optimization process to estimate the parameters of the likeli¬ 
hood is often costly. In addition, they do not rely on the definition of any distance, and the 
choice of the a priori distributions in the statistical models requires a non-trivial expertise. 


3.6 Subspace-based methods 

Some of the clustering approaches reviewed so far share the belief that a carelessly usage of 
all the avai lable attributes may lead to poor clusterings. This is the case, e.g., in the work 
by Villa-Vialaneix et id. (2013). We have already recalled the phenomenon called curse 
of dimensionality in Section 13.21 when the number of attributes is large the difference 
in the distance between two random pairs of data points (actors, in this case) tends to 
zero. This phenomenon motivates the development of clustering approaches focused on 
the identification of the discriminative attributes to produce well separated clusters. This 
general approach is known as subspace clustering, and has been also applied to the case 
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of node-attributed graphs. Subspace clustering methods are designed to select the ‘best’ 
subsets of dimensions. They search the projections of the data in different dimensions and 
identify clusters that are relevant locally to some of these subspaces. 

Subspace clustering is interesting because it may reveal groups that would not be de¬ 
tected considering the entire set of attributes. However finding relevant projections is com¬ 
putationally hard. The final choice of which groups to keep is also costly and requires 
an optimization step combining the best size, density, entropy, dimensionality and any 
other relevant quality function (see Section [4~il . Moreover, as each cluster is relevant in its 
own subspace, this has the effect of producing overlapping clusters and requires additional 
efforts to control the redundancy ratio between them. 

One semi-automated _approach to identify relevant subsets of attributes has been pre¬ 
sented by Cruz et al. ( 201 lbl) . where the authors propose a framework helping human 
analysts to manually select their preferred compositional perspective. The choice of the 
subset of attributes is given explicitly as an input to an automatic clustering process. 

Differently, Gtinnemann et al. (2013) propose a completely automated method to ef¬ 
ficiently combine subspace and subgraph clusters. In particular, they use their former 
GAMer method to extract an exhaustive list of candidate clusters, but apply a different 
final selection of the clusters to be returned to the user. The GAMer method greedily selects 
the clusters that locally optimize a quality measure. Here, they propose a solution based 
on global optimization, maximizing the sum of the clusters’ qualities under redundancy 
constraints. The overall complexity of this definition of clustering is #P- harcfl Therefore, 
the authors propose a heuristic that, for example, produces a clustering of the whole DBLP 
database^ in about 7 hours with commonly available hardware. They also show that the 
quality remains comparable to the greedy solution computed by GAMer in terms of FI 
value and density. 

The time complexity of subspace clustering approaches is notoriously high, but the 
discovery of dense subgraphs in selected subspaces can be valuable. However, the high 
number of required input parameters (minimum cluster size, dimensionality, density, re¬ 
dundancy) can have a negative impact on the practical usability of these methods. Finally, 
as we will see in Section |4~T1 the evaluation of attributed graph clusters in general is still 
under study, and maybe more for overlapping ones where no ground truth exists. 


3.7 Other methods 


Other works directly extend well-k nown and effi cient graph-based methods. ICruz et al. 


([201 la) extend the Louvain method (Blondel et al., 2008) introducing a local minimization 
of the entropy generated by the attributes between the modularity optimization and the 
community aggregation steps. Dang & Viennet (2012) also extend the Louvain method 
in a similar way, by optimizing at each iteration the linear combination of the classical 
modularity and a new modularity based on the attributes. 


4 This is the complexity of some hard counting problems, and implies that an exact solution to this 
problem cannot be currently computed in acceptable time 

5 133 097 nodes; 631 384 edges; 2 695 attribute dimensions. Available at: http://dblp.uni-trier.de 
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Akoglu et al. (2012) propose a method to identify cohesive groups in attributed graphs 


composed of n nodes each described by a feature vector. In this case, the attributes are 
binary, i.e., a node either has or not certain attributes. The algorithm uses the adjacency 
matrix A„ x « of the graph and a matrix F nX f representing the assignation of features for 
each vector. The main underlying idea is to find k groups of nodes using the structural 
information and / groups using the feature information. The cost function is based on the 
encoding of the matrices A and F as well as the configuration of the clusters, where the 
e ncoding uses th e appro ach proposed bv lRissanenl (119831) . 


Barbieri et al. (2013) present an approach using the notion of information cascades, 


and in particular the idea that an information cascade is more likely to occur within a 
community rather than between communities. Thus, they use a given set of information 
cascades to build a probabilistic model named Community-Cascade Network (CCN). To 
learn the parameters of the model authors use an expectation-maximization approach, 
which however has been reported to be computational expensive. 


Ruan et al. (2013) also propose a content- and structure-based community detection 


algorithm called CODICIL. The algorithm starts by creating an edge set with the structure 
and a graph generated from the similarity of the nodes, i.e., the final edge set will contain 
the original structure plus edges derived from obtaining the top k most similar neighbors 
for each node. This similarity is calculated using the cosine distance between the TF-IDF 
vector from the content of each node. Then, this new graph is sampled to select certain 
relevant edges and, at last, this sampled graph is clustered using a classic graph clustering 
technique. 

Finally, some approaches focus on the discovery of significant patterns, such as as socia¬ 


tion ru l es or regular structures in graphs. Significan t exam ples are the work s by Mose r et al.. 
( 2009 ). Silva et al. ( 201 ()l) . Atzmueller & Mitzlaff ( 2011 ) and Pool et al ( 20141) . : bcusing 


on mining descriptive community patterns and allowing the analysts to understand the 
structure of frequent subgraphs around topics which may be useful in scenarios like fraud 
detection or counter-terrorism. Differently from graph partitioning methods, frequent pat¬ 
terns can overlap and do not necessarily cover the entire dataset. 


4 Practical aspects 
4.1 Evaluation 

Comparing the quality of two clusterings is a fundamental capability. It can be used to 
choose among alternative algorithms, inside a single algorithm as a stopping condition or 
as a guide to choose the next step in a so-called greedy approach, making an assignment 
that maximizes the quality improvement. However, evaluating clustering algorithms is an 
open problem, even when graphs without attributesor even tabular data are involved. This 
has been clearly discussed in recent surveys by Schaeffer ( 2007 ) and Fortunate ( 2Q10h 
where the identified problems not only concern the ambiguous and personal definition of 
good cluster, but also the need for results that are easier to interpret and use, benchmark 
datasets and quality functions to explain why a clustering is regarded as good or not. 

While evaluating graph clustering is a hard and open problem even when no attributes 
are present, several measures to evaluate graph clusterings have been proposed, and some 
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have been extensively applied. Therefore, without claiming that these measures represent 
the final or only solution to the problem, in this section we start from them as an existing 
way of evaluating graph clustering and focus on what we need to add when we deal with 
attributed graphs. 

The main additional aspect to consider when attributed graphs are involved is the co¬ 
existence of multiple objective functions. Having a description of the data that includes 
both structural and compositional aspects, we may have sets of nodes that are very similar 
according to their attributes but disconnected from each other. Similarly, we may have well 
connected sets of nodes with rather heterogeneous compositional attributes. Both cases 
can be considered good clusters depending on the user requirements and while we would 
certainly prefer to identify sets of nodes making a good cluster with respect to all these 
aspects, we must accept the co-existence of multiple evaluation functions — or a multi- 
objective evaluation function. 

In the rest of this section, we introduce relevant evaluation measures for different as¬ 
pects involved in defining good attributed graph clusters. In order to demonstrate their 
differences, we apply these measures to a toy graph. 


4.1.1 Structural measures 

Evaluating the quality of a clustering of a simple graph without node or edge attributes 
is a complex problem in itself. In this section, we will consider two different scenarios: 
evaluation with and without ground truth. 


External evaluation measures. When ground truth is available, the problem is reduced 
to computing similarity between two clusterings. Since we confront the found structures to 
externally provided class information, we call such similarity measures external evaluation 
measures. These measures can be divided into two main groups: based on pair counting 
and based on information theory. We will briefly discuss the most typical representatives 
to give the readers an idea rather than a complete overview of the methods. 

Given two partitions C„ = { C u i.C„ 2 ..... C um } and C v = {C v i,C„ 2 ,... ,C V ,} of a set 
of nodes, the pair-counting-based measures show the proportion of agreement between 
both partitions. These measures have two requirements: (1) the partitions are disjoint, i.e., 
n c , eC C; = 0, and (2) all elements have the same weight in the clustering process. 

The Rand index (RI) is one of the first approaches for comparing two partitions (Rand, 


1971). It can be considered as an alternative to accuracy because it expresses the number 


of pairs of nodes that were placed within the same group in both partitions divided by the 
number of all node pairs. This comparison leads to a similarity function c (C„, C v ) between 
partitions that is expressed as 

c \-Y M. 

'-'V — fn\ ’ 

i<j \2> 


•(C u 


(6) 


where 


! 1 if 3k, k! : x, , Xj G C uk A x, ,Xj € C vk t 

1 if 3k,kf : Xj,Xj C uk Axi,Xj £ C vV 

0 otherwise. 
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The agreements between partitions C„ and C v can be summarized using a contingency 
matrix as presented in Figure[TT] In this matrix, n, ; is the number of agreements while n,. 
is the number of elements of the /th group from the C„ partition and n.j is the number of 
elements in the /'th group in the C v partition. 


C v 
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n v 2 . 
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Fig. 11. A contingency matrix representing the agreements n,j between two partitions 


Using a contingency matrix similar to the one presented in FigureQT| Equation[ 6 ]can be 
re-expressed as 


c(C„,C v ) = 


(2) - [1/2 (l,- (:Ljmjf + Lj C Lina ) 2 ) - LL^ 


(2) 


(7) 


, i.e., it is 0 when the partitions are dissimilar and 1 when the 


Note thatc(C„,C v ) G [0,1 
partitions are identical. Later 
(ARI) which is a version of the Rand index corrected for chance. The ARI is given by 


Hubert & Arabiel (119851) introduced the adjusted Rand index 


ARI (C«,C V ) = 


EL 1 Ifj=i ("2) - ELi (2 ) E/=i ( n 2 J ) /© 


ELi(L) + EJ=iCV) - ELiG>)E* = i('V) /G) 


( 8 ) 


where n.j and n i; - are values taken from the contingency matrix in FigurefTTl 

Another common measure is the Jaccard index which is given by the ratio of the node 
pairs that were clustered together in both partitions and the node pairs clustered together 
in at least one partition ( Jaccard, 19011). 

The second group of external evaluation measures uses mutual information (MI) be¬ 
tween partitions, i.e., the information both partitions share. These measures are based on 
entropy and joint entropy of the partitions. Using the same contingency matrix presented 
in Figure [TT] the MI index is given by 


ni.n.j/n A 


(9) 


i=iy'=i 

This measure can be normalized by the joint entropy of the partitions ensuring that the MI 
lies within the interval [—1,1] or [0,1]. Variations of this measure with different normaliz 


ing fac tors or adjustments wit h correction for chance are presented in detail bv lDanon et al. 
(2005) and Vinh et al ( 2Q10l) . 


Internal evaluation measures. Without ground truth, determining the quality of a clus¬ 
tering is based on its intrinsic ch aract eristics. We refer to such measures as internal eval¬ 
uation measures. According to Ben-David & Ackermanl ( 2008 ). “a clustering quality mea¬ 
sure is a function that maps pairs of the form {dataset , clustering) to some ordered set 
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(say, the set of non-negative real numbers), so that these values reflect how good or co¬ 
gent that clustering is.” Some general properties for good quality measures have been 
proposed, such as scale invariance, mon otonicity and richness (Be n-David & Ackerman , 
2008tlvan Laarhoven & Marchioril 120131) . but in practice the problem depends on the pur¬ 
pose of the analysis. 


To assess quality, Gaertler (2005) uses two functions, /(C) and g (C), to measure, 
respectively, the density and the sparsity of the clustering. These functions are combined 
as follows 

f(C) + g(C) 


index (C) = : 


N(G) 


GO) 


where N (G) is a_normalization function for the index defined as max{/ + g} over all 
clusterings (Brandes etal., 2008). Using the general index defined in Equation [Tol three 
different quality indices can be derived: coverage, conductance, and performance. 

Coverage y(C) is a measure of the ratio of the intra-cluster weights to the total amount 
of edge weights: 

co(E(C)) 


7(C) = 


co(E) 


( 11 ) 


where E (C) is the set of intra-cluster edges and co (•) is the sum of the weights of a set of 
edges. According to the general definition in Equation[l0] f = co(E (C)) and g = 0. 

Conductance (p (G) is a measure based on the observation that if a cluster is well con¬ 
nected, then a large number of edges have to be removed in order to bisect it. Thus, 
conductance (p ( G ) of a graph G is the minimum conductance value over all cuts of G 
(B randes e t al., 20081) — that is, the lowest possible value of the total weight of all edges 
between the clusters of a partition C. Along with the graph conductance, two other mea¬ 
sures exist: intra-cluster conductance a (C) and inter-cluster conductance 5(C). Intra¬ 
cluster conductance is the minimum conductance value over all induced subgraphs G(C,) 
while the inter-cluster conductance is the maximum conductance over all induced cuts 


(C,, C,). Thus, given a cut C = (C, C), according to lBrandes et al. (2008), the conductances 
<p (C) and <p (G) can be defined as follows: 


(C) = | 0 ’ 


<B(C) 


min(a(C),fl(C)) : 


ce{ 0 ,u} 

C^ {0,V}Aco(C) = 0 
otherwise 


<P(G) = min <p(C), 

where a (C) is the sum of the weight over all edges adjacent to C. It is expressed as 

«(C)= 2 £ co(e)+ £ «(/). 

eeE(C) feE(c,c) 

The intra-cluster conductance of a partition C is defined as 


( 12 ) 

(13) 


a( C) = min <p(G(C,)), 

<e{ b-,*} 


(14) 
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while the inter-cluster conductance of a partition C as 


5(C) = 


1 , 


ifC = {V} 


1 — max/gj! j.} (p ( Ci ), otherwise. 


( 15 ) 


In order to express the preceding indices in the form of the general framework from 
Equation[lO] we set g = 0 for intra-cluster conductance, / = 0 for inter-cluster conductance 
and N = f + g = 1 for both cases. 

Performance defines the quality of a partition based on the “correctness” of the clas¬ 
sification of a node pair. The density function / counts the number of edges within all 
clusters while the sparsity function g counts the “nonexistent edges” between clusters 
(Gaertler, 2005), that is, the number of not connected pairs of nodes among all clusters. 
The definitions are 

/(C) = £|£(C,-)| 

'1 (16) 
g(C)= £ [{u,v)<£E]Iij(u,v), 

u,vEV 

where the function I is defined as: 




[ 1, u SQAve Cj,i j 
0, otherwise 


Finally, performance as presented bv IBrande s et al. (2008) is 

f(C) + g(C) 


perf(C) = 


\n (n — 1) 


(17) 


(18) 


where n is the number of nodes of the graph. 

A comparison of clustering algorithms and measures has been provided byjLeskovec et al. 
(2010), a nd more details con cerning the limitation s of these measures can be found in the 
works by Gaertler (2005) and Brandes et al. ( 2008 ). 


Other candidates fo r a quality measure are density and modularity (INewman & Girvan , 
2004 ; Fortunatol 2Q10l) . and they can also be directly optimized instead of just being used as 
evaluation functions. We will not add additional details about modularity, that has already 
been described earlier in the article. 

In Figure [12] some sample measures are illustrated on two alternative partitions of the 
same graph. 


4.1.2 Edge-attributed graph clustering 


Only a few works have proposed evaluation measures for multiple graphs. The measure 


introduced by [Mucha e t al 


201C)h takes into account both the pairs of nodes and the pairs 


of graphs — this approach has already been described in Section f!T2l 

A different approach is given by Boden et al. (2012). In the spirit of subspace clustering, 
a set of “interesting” non-redundant clusters is sought. Candidate multidimensional clus¬ 
ters are considered to be all the node sets that are densely connected in every respective 
dimension (in all single layers that are contained in the cluster). From these, the result 
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Modularity: 0.347 
Coverage: 0.867 



Modularity: 0.357 
Coverage: 0.867 
Rand Index: 0.8 
Adjusted Rl: 0.597 
Ml Index: 0.52 

Normalized Mil: 0.619 



Modularity: 0.263 
Coverage: 0.8 


Rand Index: 0.64 
Adjusted Rl: 0.286 
Ml Index: 0.764 

Normalized Mil: 0.433 


Fig. 12. Two graph partitions (lower graphs) and ground truth (upper graph): the values 
of some internal (modularity, coverage) and external evaluation measures (computed 
according to the ground truth) are shown 


is selected by maximizing the quality sum 'LcQ{C) °f all clusters while keeping the 
set of clusters non-redundant. Redundancy is computed as an overlap of edges of two 
clusters. The quality function Q(C) is meant to be specified by users since it is application- 
dependent. Nevertheless, the authors provide a default quality function which multiplies 
average density of the layers, size and dimensionality. Additionally, a minimum cluster 
size is set to 8 nodes and a minimum of 2 dimensions is required for each cluster. This 
evaluation measure is bound to a specific cluster model. Moreover, it is limited to finding 
multi-dimensional clusters that are clustered in all the single layers at the same time (this 
results from the condition on the candidate clusters). 

The problem of measuring distances between clusterings of graphs with weighted edges 
of multiple types is also tackled by Rocklin & Pinar (1201 1). 


4.1.3 Node-attributed graph clustering 

Node- attri buted graph cluste r ing ap proaches like the ones by Zhou et al. ( 20091) . Cruz et al. 
(2014) and Dan g & Viennet ( 20121) use a combination of two measures: density 8 for the 
structural part and entropy Jif for the attributes. Given a graph G (V.E) and a partition 
C = {Ci,C 2) ... ,Q} of G, density is defined as: 


5(C) = \E(Q)\, 

I C/GC 


(19) 


where E (C,) is the set of edges that start and finish in the ;th community. That is, density 
represents the proportion of edges that lie within the communities and a higher density 
corresponds to a better clustering. 

The term entropy, used in several different contexts to measure the degree of disorder of 
a complex system, indicates the heterogeneity of the elements inside a cluster according to 
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their attribute values. It is given by 

Jf(C) = 

where H (C,) is the entropy of the z'th community and is calculated as 


jvf 

I y I C, eC 


( 20 ) 


H (C,) = - £ ptj In Pij + (1 - Pij)ln (1 - pij) , 
j= i 

where r is the number of attributes and is the proportion of elements in the community 
Ci with the same value on the attribute j. The objective of the clustering is to reduce the 
entropy which is equivalent to increasing the homogeneity of the partition. 

Another validation technique is presented by iLi et al. (2008). In this work, documents 
are classified into ACM’s 17 major computer science categories. This is a fuzzy classifica¬ 
tion that allows each document to belong to several categories. Thus, each document dj is 
assigned to a (17-dimensional) topic vector z, and then the documents are clustered into K 
groups. Each group Cj is further assigned to a topic vector Zj. 

The paper defines a measure called PCS as 

PCSi 


PCS = 


K 


( 21 ) 


where K is the number of communities, PCSi is 


PCSj = 


T,k:d jk eCj 11 (d jk ) 


where tij is the size of the community j and 


V {djk) = 


1 if z Jk = Zj 

0 otherwise . 


Thus for each cluster Cj, the measure computes the proportion of elements d k £ Cj such 
that Zk = Zj, i.e., how many documents within the community have a topic vector that is 
equal to the community’s topic vector. 

In some cases it is possible to define the number and labels of the groups by hand 
as presented by Ge et al. (2008) where authors compare the obtained partition with the 
expected one by counting the number of elements classified correctly by an algorithm. 
This approach is acceptable for small networks but becomes prohibitive for large networks 
with high dimensional feature spaces. 

When ground truth is available, it is possible to use validation methods such as Rand 
index or mutual information index. In this line. Combe et al. (2012) define a framework 
for comparing the resulting partition with the ground truth. They use a contingency matrix 
(similar to the one presented in Figure ITTIl created from the ground truth and a partition 
found by the tested algorithm. Then they calculate the proportion of nodes that were well 
grouped according to the ground truth. 


Yang et al. (2009:) use two validation approaches that are based on ground truth: the 


normalized mutual information (NMI), briefly described in Section l4.1.11 and the pairwise 
F measure (PWF). The PWF measure is given by the relation between pairwise precision 






















ZU064-05-FPR article 9 January 2015 


1:22 


Clustering attributed graphs 


29 


and recall. This relation is 


PWF = 


(l + j3 2 ) precision x recall 
(j3 x precision) + recall 


( 22 ) 


where /3 > 0 is a parameter used to favor either precision or recall. It is common to leave 
(3 = 1. To calculate precision and recall, the following expressions are used 


precision 

recall 


|snzj 

~W 

|snr| 
|r| > 


where S is the set of node pairs that are assigned to the same community and T is the set 
of node pairs that have the same label. 


4.1.4 A multi-objective evaluation approach 


In the previous sections we introduced several evaluation measures and we have seen 
that, in general, finding a good clustering of an attributed graph requires optimization 
of at least two objective functions. Therefore, there will always be a trade-off between 
compositional and structural dimensions. For node-attributed graphs, the objectives are 
the structural quality of the clusters (intra-cluster vs. inter-cluster edges) and the intra¬ 
cluster homogeneity of the node attributes. For edge-attributed graphs, the situation is 
more complicated since it is less obvious how to define a good clustering. According to 
Boden el al. (2012), cluster candidates are well clustered in all of their dimensions, but 
this assumption could prevent the discovery of potentially useful clusters. 

Another possible evaluation perspective consists in no longer checking if a clustering 
is good as a whole, but whether any specific interesting clusters are found. In general, in 
order to evaluate a specific cluster in an attributed graph, one can take into consideration 
its structural quality, homogeneity of node attributes, size, dimensionality and novelty. 
We can thus see these variables as different dimensions of a search space where each 
multidimensional point is a cluster. Good clusters can be selected based on custom settings 
of weights of the dimensions, or unweighted approaches like the Pareto front can be used 
to find all clusters that are potentially better than others according to any combination of 
these basic evaluation functions. 

For structural quality and node homogeneity, any measure from Sections l4. 1.1 I and l4. 1 .31 
may be selected. To assess novelty, we suggest to use one of the proposed measures of 
overlap, such as Jaccard index. The value of the maximum overlap can be returned as 
novelty. In this way, emerging clusters of minimal dimensionality are favored, preventing 
information overload. 


4.2 Applicability 

Approaches preprocessing edge- or node-attributed graphs by reducing them to graphs 
without attributes normally keep the same asymptotic complexity of the clustering algo¬ 
rithm used after preprocessing. The exact complexity of the preprocessing phase depends 
on the data structure and the specific flattening algorithm, but it is normally achievable 
in close-to-linear time on the size of the graph. As an example, edge-attributed flattening 
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as described in Definition [3] and using a tree-based main memory indexing structure takes 
f/{m\ogm), where m is the number of edges, that is, the average number of edges per edge 
type times the number of edge types. 

As such, while not taking full advantage of the information represented by the different 
edge types, these methods can be applied to ve ry large graph s using any of the existing 
efficient clustering algorithms reported e.g. by Coscia et al. (2011), they are simple to 
implement and (with some variation in the flattening algorithm) can also be applied to 
directed and weighted graphs. However, in the case of weighted edge-attributed graphs, 
domain knowledge is necessary to decide how to merge weights on different edge-types. 
The conceptual problem of merging weights with different semantics as described by 
Magnani & Rossi ( 2013b ). e.g., the number of exchanged messages on an email layer and 
the duration of friendship on a social media network, emphasizes the deficiencies of single¬ 
layer approaches. 

Similarly, for node-attributed graphs where the node attributes are flattened into edge 
weights before applying a community detection algorithm, the time complexity of the 
preprocessing step depends on the number of attributes and on the method used to compute 
how similar the nodes are. In case of matching similarity, for each edge, the number 
of common attributes between the end nodes is computed which takes ff(mf), where / 
is the attribute space size and where in general / <C m. In high dimensional spaces we 
expect that each node is described by a sparse vector and that allows for efficient methods 
such as growing self-organizing maps. These methods, when coupled with efficient graph 
clustering, exhibit a near linear complexity. The other methods still take advantage of the 
sparse nature of the graph and thus, having less than quadratic complexity, are able to 
address large datasets. 

With more integrated methods, such as subspace approaches or the one proposed by 
Ruan et al. (2013J), the clustering process can reach a high complexity — quadratic and 
more. But in general, for linear combination or walk-based methods, the complexity de¬ 
pends on the algorithm used for clustering the features, e.g., SOM or k-means among 
others, and whether the approach is global or local. The resulting process can still be 
practically used for reasonably large graphs, and graphs with hundred thousand nodes have 
been successfully processed in the reviewed works on subspace clustering. 

On the other hand, most of the community detection algorithms require the choice of 
parameters that control the output of the algorithm, for example the number of clusters k, 
the weight to emphasize the connectivity a or weighting variables for linear combination 
approaches, the number of iterations, statistical distributions for model-based methods, 
redundancy or heuristics in NP-hard subspace approaches; this sometimes requires major 
assumptions and domain knowledge about the data, which reduces their applicability. Only 
a f ew methods among the ones reported in th is w ork are parameter-fre e, including the ones 
by Neville et al. ( 2003 ). Cruz et al. ( 201 la ) and Akoglu et al. ( 2012 ). 

Regarding the directionality of the edges, most of the methods described in this article 
rely on the application of existing approaches when the structural part of the graph must 
be analyzed, in which case any existing algorithm for directed graphs can be used. This 
evidently applies to the single-layer and weight-modification approaches, and is also the 
case for subspace methods, even if these last approaches may require some adaptation 
when specific algorithms have been hardcoded inside them. Methods based on extended 
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modularity cannot be used without modifications on directed graphs, because they are 
based on the original definition of modularity which assumes undirected edges. However, 
they can be extended in the same way as it has been done by Nicosia et al (2009) for 
non-attributed graphs. With respect to node-attributed graphs, approaches based on linear 
combination can be straightforwardly used with directed edges as they are based on the 
computation of graph distances, that can be obtained on directed graphs as well. Similarly, 
walk-based approaches are naturally well suited to directed graphs. 

As a final consideration, the works we have mentioned so far are all based on the 
general idea of clustering several dimensions at the same time; e.g., relationships, affili¬ 
ation, competencies, socio-demographic features, among others. However, the information 
stored respectively in the attributes and in the edges may be uncorrelated and will not 
necessarily reinforce the same clusters. In practice, trying to merge several dimensions may 
result in failing to find any well separated clusters even when clusters exist under a single 
dimension. An alternative approach is to run dedicated and specialized clustering steps for 
each dimension (structure, edge attributes, etc.), and then integrate the resulting partitions 
a posteriori only if this leads to better clusters. Cruz & Bothorel (2013) propose to ma¬ 
nipulate the partitions with a contingency matrix where structural groups are in rows and 
compositional ones are in columns. The integration of the partitions relies on predefined 
strategies. Even if matrix manipulation may not seem user-friendly, this original proposal 
is interesting from another perspective: according to their objectives, the analysts can try 
different combinations without re-computing the basic partitions and thus potentially save 
computational costs. 


5 Open problems and discussion 


Attributed graph clustering is an active research area, and as such it presents a number 
of open problems. In addition, being it an extension and combination of well established 
areas (graph clustering and multi-dimensional relational clustering), open problems can 
be classified into two main categories: 1) those already present when single graphs are 
considered (and the easier to identify) and 2) those specifically related to the combination 
of structure and attributes. 

An example of the first category pertains to partitioning and overlapping algorithms. 
While the majority of graph clustering methods partition nodes into disjoint sets, many 
authors have pointed out that in real contexts individuals often belong to multiple commu¬ 
nities. Even without considering attributes, this has motivated the development of several 
methods , such as the well-kn own clique percolation m ethod by Pall a et al. (120051) . or the 
ones by iNicosia et al. (12009b and IWang et al. (1201 ill where extended versions of mod- 
ularity are used to evaluate overlapping clusters. In their recent paper, Xie et al. (2013) 
review the state-of-the-art in overlapping community detection algorithms, quality mea¬ 
sures, and benchmarks for non-attributed graphs. They provide a framework to evaluate 
the performances of both the community-level and node-level detection, and conclude that 
this research field is still work in progress, as more than 70% of the overlaps still remain 
uncovered. Other problems include how to measure the significance of overlapping nodes 


and ho w to interpret the resulting communities (IXie et al.l 120131) . Recently, lYang et al. 


(2013) have used node attributed graphs for detecting overlapping communities, stating 
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that the resulting communities can be interpreted more easily by analyzing the attributes 
of the nodes belonging to each community. However, quality and interpretation issues are 
still open questions. 

In general, like for other kinds of approaches, the presence of attributes introduces more 
parameters to be considered and requires the consideration of multiple aspects at the same 
time. However, in our opinion, when edge attributes are present, the dispute between parti¬ 
tioning and overlapping approaches should be reconsidered. In fact, overlapping is usually 
determined by participation in different networks: as an example, the same individual can 
be in her working team community, in her family community, in the community of her 
team mates at the fencing club, etc. This example suggests that if we can split our social 
network into a set of specialized networks (or, saying it in another way, if we can cluster our 
relationships into different classes), then we may find that some specialized networks only 
involve partitions. However, this consideration should not be understood as a statement 
against overlapping methods. 

An example of the second category of open problems is the exponential explosion in 
the number of attribute value combinations to be considered during the clustering process. 
While this is a well-known problem in relational data mining, it is unknown in the domain 
of single graph clustering, and it is one of the main aspects reviewed in this article. In 
Section[2l we hypothesized that clusters can emerge when a specific combination of graphs 
is considered, and disappear when more graphs are added to the model. In Section [3] 
we discussed the notions of point of view and subspace clustering to counteract the fact 
that considering all the node attributes may lead to the curse of dimensionality problem. 
Furthermore, beyond the quantitative selection of a good subset of original data (which can 
be stated as a feature selection problem), scientists will have to take into account qualitative 
considerations: how to define the analysis context in order to decide how good a clustering 
is? How to make this context understandable to an analyst without domain knowledge 
and usable by a domain expert without deep analytical skills? How to conceive efficient 
techniques to present multiple results in real-time? 

The main problem related to the existence of multiple points of view which is peculiar of 
graphs with edge labels (and more in general multiple interconnected graphs) is the exis¬ 
tence of a large number of views, where every view corresponds to a specific combination 
of values on the edges. Despite some promising attempts to address this problem, inspired 
by the field of sub-space clustering, in the authors’ opinion this aspect deserves a lot more 
research to be able to apply clustering algorithms to real on-line social networks. Given 
the intrinsic computational complexity of the problem, a possible direction involves the 
consideration of domain knowledge to focus the cluster discovery process on promising 
combinations of dimensions. 


Initial work in this direction by Cruz et al. (2013) has debited control facilities to com¬ 
bine existing precomputed partitions. The objective is to offer tools to compare different 
approaches and visualize the results in a way that allows user feedback. The success 
of UCINET and — more recently — visual analytics software like Gephi and NodeXL 
is a sign that analysts are requesting such easy-to-apply tools. This requires advances 
focusing on usability, simplicity, efficiency and scalability, evaluation facilities such as 
comparison of methods, selection of relevant attributes and/or modeling. In fact, these 
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research directions are as meaningful in an attributed-graph context as they are for non- 
attributed graphs. 

Understandably, early works on attributed graph clustering have focused on finding static 
communities, which is a preliminary and necessary step to study their evolution. Here 
researchers can partially reuse the same approaches used to find evolving communities on 
simple graphs, in particular the comparison of nodes clustered at different timestamps to 
identify evolutionary steps like create, merge and split. However, in the case of attributed 
graphs the evolution does not only regard the networks. The existence of multiple inter¬ 
connected graphs and communities spanning some of them may also require a revision of 
the concept of evolutionary step. 

A related problem that has generated a whole research sub-field in the realm of simple 
graphs is the study of network creation models. What are the forces leading to a specific 
network model exposing a modular structure? Rephrasing this question in the context of 
attributed graphs, how can we explain not only how some people have become densely in¬ 
terconnected, i.e., a cluster, but also why their attributes follow a specific value distribution 
and how these connections have developed in the different graph layers or edge types? 

All the aspects mentioned so far highlight different levels of increasing complexity 
that we have to face when we consider attributes: the number of views to evaluate, the 
number of parameters to consider, e.g., in the evaluation functions, and the number of 
configurations of the system, e.g., the additional degrees of freedom in its evolution. A 
straightforward conclusion is that in the case of attributed graphs the applicability of 
forthcoming results may be strictly dependent on algorithmic advances, in particular re¬ 
garding computational models like streaming, distributed, budget-based, approximate and 
incremental approaches enabling big data analysis. 

Bibliography 

Aggarwal, Charu, & Wang, Haixun. (2010). Managing and Mining Graph Data. Springer 
US. 

Akoglu, Leman, Tong, Hanghang, Meeder, Brendan, & Faloutsos, Christos. (2012). Pics: 
Parameter-free identification of cohesive subgroups in large attributed graphs. Pages 
439-450 of: Sdm. SIAM / Omnipress. 

Atzmueller, Martin, & Mitzlaff, Folke. (2011). Efficient descriptive community mining. 

Proc. 24th inti, flairs conference. AAAI Press. 

Balasubramanyan, Ramnath, & Cohen, William W. (2011). Block-lda: Jointly modeling 
entity-annotated text and entity-entity links. Chap. 39, pages 450-461. 

Barabasi, Albert-Laszlo, & Albert, Reka. (1999). Emergence of scaling in random 
networks. Science, 286(5439), 509-512. 

Barbieri, Nicola, Bonchi, Francesco, & Manco, Giuseppe. (2013). Cascade-based 
community detection. Pages 33-42 of: Proceedings of the sixth acm international 
conference on web search and data mining. WSDM ’13. New York, NY, USA: ACM. 
Ben-David, Shai, & Ackerman, Margareta. (2008). Measures of clustering quality: A 
working set of axioms for clustering. Pages 121-128 of: Advances in neural information 
processing systems. 

Berlingerio, Michele, Coscia, Michele, & Giannotti, Fosca. (2011a). Finding and 
Characterizing Communities in Multidimensional Networks. Pages 490-494 of: 2011 
international conference on advances in social networks analysis and mining. IEEE. 



ZU064-05-FPR article 9 January 2015 


1:22 


34 C. Bothorel, J. D. Cruz, M. Magnani and B. Micenkovd 

Berlingerio, Michele, Coscia, Michele, Giannotti, Fosca, Monreale, Anna, & Pedreschi, 
Dino. (2011b). Foundations of Multidimensional Network Analysis. Pages 485-489 
of: 2011 international conference on advances in social networks analysis and mining. 
IEEE. 

Berlingerio, Michele, Pinelli, Fabio, & Calabrese, Francesco. (2013). ABACUS: Apriori- 
BAsed Community discovery in multidimensional networks. Data Mining and 
Knowledge Discovery , Springer, 27(3). 

Blondel, Vincent D., Guillaume, Jean-Loup, Lambiotte, Renaud, & Lefebvre, Etienne. 
(2008). Fast unfolding of communities in large networks. Journal of statistical 
mechanics: Theory and experiment, 2008(10), P10008 (12pp). 

Boden, Brigitte, Gtinnemann, Stephan, Hoffmann, Holger, & Seidl, Thomas. (2012). 
Mining coherent subgraphs in multi-layer graphs with edge labels. Page 1258 of: 
Proceedings of the 18th acm sigkdd international conference on knowledge discovery 
and data mining - kdd ’12. New York, USA: ACM Press. 

Bonchi, Francesco, Gionis, Aristides, Gullo, Francesco, & Ukkonen, Antti. (2012). 
Chromatic correlation clustering. Page 1321 of: Proceedings of the 18th acm sigkdd 
international conference on knowledge discovery and data mining - kdd ’12. New York, 
USA: ACM Press. 

Borgatti, Stephen P, Mehra, Ajay, Brass, Daniel J, & Fabianca, Giuseppe. (2009). Network 
analysis in the social sciences. Science, 323(5916), 892-5. 

Brandes, Ulrik, Gaertler, Marco, & Wagner, Dorothea. (2008). Engineering graph 
clustering: Models and experimental evaluation. Journal of experimental algorithmics, 
12, 1-26. 

Brodka, Piotr, Stawiak, Pawel, & Kazienko, Przemyslaw. (2011). Shortest Path Discovery 
in the Multi-layered Social Network. Pages 497-501 of: 2011 international conference 
on advances in social networks analysis and mining. IEEE. 

Cai, Deng, Shao, Zheng, He, Xiaofei, Yan, Xifeng, & Han, Jiawei. (2005). Mining hidden 
community in heterogeneous social networks. Pages 58-65 of: Proceedings of the 3rd 
international workshop on link discovery - linkkdd ’05. New York, New York, USA: 
ACM Press. 

Combe, David, Largeron, Christine, Egyed-Zsigmond, Elod, & Gery, Mathias. (2012). 
Combining relations and text in scientific network clustering. Pages 1280-1285 of: 
2012 IEEE/ACM International Conference on Advances in Social Networks Analysis 
and Mining. 

Coscia, Michele, Giannotti, Fosca, & Pedreschi, Dino. (2011). A classification for 
community discovery methods in complex networks. Statistical analysis and data 
mining, 4(5), 512-546. 

Cruz, Juan David, & Bothorel, Cecile. (2013). Information integration for detecting 
communities in attributed graphs. Computational aspects of social networks (cason). 

Cruz, Juan David, Bothorel, Cecile, & Poulet, Francois. (2011a). Entropy based 
community detection in augmented social networks. Pages 163-168 of: Computational 
aspects of social networks (2011). 

Cruz, Juan David, Bothorel, Cecile, & Poulet, Francois. (2011b). Semantic clustering of 
social networks using points of view. Coria: conference en recherche d’information et 
applications 2011. 



ZU064-05-FPR article 9 January 2015 


1:22 


* 35 

Cruz, Juan David, Bothorel, Cecile, & Poulet, Francois. (2012). Detection et visualisation 
des communautes dans les reseaux sociaux. Revue d’intelligence artificielle , 26(4), 369- 
392. 

Cruz, Juan David, Bothorel, Cecile, & Poulet, Frangois. (2013). Integrating heterogeneous 
information within a social network for detecting communities. 2013 IEEE/ACM 
International Conference on Advances in Social Networks Analysis and Mining. 

Cruz, Juan David, Bothorel, Cecile, & Poulet, Frangois. (2014). Community detection and 
visualization in social networks: Integrating structural and semantic information. Acm 
trans. intell. syst. technol., 5(1), 11:1—11:26. 

Dang, The Anh, & Viennet, Emmanuel. (2012). Community detection based on structural 
and attribute similarities. Pages 7-14 of: International conference on digital society 
(icds). ISBN: 978-1-61208-176-2. 

Danon, Leon, Dlaz-Guilera, Albert, Duch, Jordi, & Arenas, Alex. (2005). Comparing 
community structrue identification. Journal of statistical mechanics: Theory and 
experiment , 2005(09), P09008. 

Fortunato, Santo. (2010). Community detection in graphs. Physics reports, 486(3-5), 75- 
174. 

Fortunato, Santo, & Barthelemy, Marc. (2007). Resolution limit in community detection. 
Proceedings of the national academy of sciences of the united states ofamerica, 104(1), 
36-41. 

Freeman, Linton C. (1996). Cliques, galois lattices, and the structure of human social 
groups. Social networks , 18(3), 173 - 187. 

Gaertler, Marco. (2005). Network analysis: Methodological foundations. Springer Berlin 
/ Heidelberg. Chap. Clustering, pages 178 - 215. 

Gao, Jianxi, Buldyrev, Sergey V., Stanley, H. Eugene, & Havlin, Shlomo. (2011). Networks 
formed from interdependent networks. Nature physics, 8(1), 40-48. 

Ge, Rong, Ester, Martin, Gao, Byron J., Hu, Zengjian, Bhattacharya, Binay, & Ben-Moshe, 
Boaz. (2008). Joint cluster analysis of attribute data and relationship data: The connected 
k-center problem, algorithms and applications. Acm trans. knowl. discov. data, 2(2), 
7:1-7:35. 

Getoor, Lise, & Diehl, Christopher P. (2005). Link mining: a survey. Sigkdd explor. news!, 
7(2), 3-12. 

Goffman, Erving. (1974). Frame analysis: an essay on the organization of experience. 
Harper colophon books ; CN 372. New York: Harper & Row. 

Gong, Neil Zhenqiang, Talwalkar, Ameet, Mackey, Lester W., Huang, Ling, Shin, 
Eui Chul Richard, Stefanov, Emil, Shi, Elaine, & Song, Dawn. (2011). Jointly 
predicting links and inferring attributes using a social-attribute network (san). Corr, 
abs/1112.3265. 

Giinnemann, Stephan, Boden, Brigitte, Farber, Ines, & Seidl, Thomas. (2013). Efficient 
mining of combined subspace and subgraph clusters in graphs with feature vectors. 
Pages 261-275 of: Advances in knowledge discovery and data mining. Springer. 

Han, Jiawei, Kamber, Micheline, & Pei, Jian. (2011). Data Mining: Concepts and 
Techniques, Third Edition (The Morgan Kaufmann Series in Data Management 
Systems). Morgan Kaufmann. 



ZU064-05-FPR article 9 January 2015 


1:22 


36 C. Bothorel, J. D. Cruz, M. Mcignani and B. Micenkovd 

Hanisch, Daniel, Zien, Alexander, Zimmer, Ralf, & Lengauer, Thomas. (2002). Co¬ 
clustering of biological networks and gene expression data. Bioinformatics, 18(suppl 
1), S145-S154. 

Hric, Darko, Darst, Richard K., & Fortunato, Santo. (2014). Community detection in 
networks: structural clusters versus ground truth. Phys. Rev. E, 9, 062805 

Hubert, Lawrence, & Arabie, Phipps. (1985). Comparing partitions. Journal of 
classification ,2, 193-218. 10.1007/BF01908075. 

Jaccard, Paul. (1901). Etude comparative de la distribution florale dans une portion des 
Alpes et des Jura. Bulletin del la societe vaudoise des sciences naturelles, 37, 547-579. 

Karger, David R. (1993). Global min-cuts in rnc, and other ramifications of a simple min- 
out algorithm. Pages 21-30 of: Proceedings of the fourth annual acm-siam symposium 
on discrete algorithms. SODA ’93. Philadelphia, PA, USA: Society for Industrial and 
Applied Mathematics. 

Kazienko, Przemysaw, Brodka, Piotr, Musial, Katarzyna, & Gaworecki, Jarosaw. (2010). 
Multi-Layered Social Network Creation Based on Bibliographic Data. Pages 407-412 
of: 2010 ieee second international conference on social computing. IEEE. 

Kim, Myunghwan, & Leskovec, Jure. (2012). Multiplicative attribute graph model of real- 
world networks. Internet mathematics, 8(1-2), 113-160. 

Kivela, Mikko, Arenas, Alexandre, Barthelemy, Marc, Gleeson, James P, Moreno, Yamir, 
& Porter, Mason A. (2014). Multilayer Networks. Journal of complex networks, 1-69. 

Kohonen, Teuvo. (1997). Self-organizing maps. Secaucus, NJ, USA: Springer-Verlag New 
York, Inc. 

Kossinets, Gueorgi, & Watts, Duncan J. (2006). Empirical analysis of an evolving social 
network. Science, 311(5757), 88-90. 

La Fond, Timothy, & Neville, Jennifer. (2010). Randomization tests for distinguishing 
social influence and homophily effects. Pages 601-610 of: Proceedings of the 19th 
international conference on world wide web. WWW ’10. New York, NY, USA: ACM. 

Lancichinetti, Andrea, & Fortunato, Santo. (2011). Limits of modularity maximization in 
community detection. Physical review, e, statistical, nonlinear, and soft matter physics, 
84(6 Pt 2). 

Lazega, Emmanuel, & Pattison, Philippa E. (1999). Multiplexity, generalized exchange 
and cooperation in organizations: a case study. Social networks, 21(1), 67-90. 

Leskovec, Jure, Kleinberg, Jon, & Faloutsos, Christos. (2005). Graphs over time: 
densification laws, shrinking diameters and possible explanations. Pages 177-187 
of: Proceedings of the eleventh acm sigkdd international conference on knowledge 
discovery in data mining. ACM. 

Leskovec, Jure, Lang, Kevin J., & Mahoney, Michael. (2010). Empirical comparison of 
algorithms for network community detection. Pages 631-640 of: Proceedings of the 
19th international conference on world wide web. WWW ’10. New York, NY, USA: 
ACM. 

Li, Huajing, Nie, Zaiqing, chien Lee, Wang, Giles, C. Lee, & rong Wen, Ji. (2008). 
Scalable community discovery on textual data with relations. In proceedings of the 
acm conference on information and knowledge management (cikm 2008). 

Li, WJ, & Yeung, DY. (2009). Relation regularized matrix factorization. Pages 1126-1131 
of: Ijcai-09. IJCAI’09. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. 



ZU064-05-FPR article 9 January 2015 


1:22 


* 37 

Liu, Yan, Niculescu-Mizil, Alexandra, & Gryc, Wojciech. (2009). Topic-link Ida: Joint 
models of topic and author community. Pages 665-672 of: Proceedings of the 26th 
annual international conference on machine learning. ICML ’09. New York, NY, USA: 
ACM. 

Magnani, Matteo, & Rossi, Luca. (2011). The ML-model for multi layer network analysis. 
Ieee international conference on advances in social network analysis and mining. IEEE 
Computer Society, Los Alamitos. 

Magnani, Matteo, & Rossi, Luca. (2013a). Formation of multiple networks. Pages 257- 
264 of: Social computing, behavioral-cultural modeling and prediction. Springer. 

Magnani, Matteo, & Rossi, Luca. (2013b). Pareto Distance for Multilayer Network 
Analysis. Pages 249-256 of: Social computing, behavioral-cultural modeling and 
prediction. Springer. 

Magnani, Matteo, Micenkova, Barbora, & Rossi, Luca. (2013). Combinatorial Analysis of 
Multiple Networks, arxivpreprint. 

Minor, Michael J. (1983). New directions in multiplexity analysis. Applied network 
analysis, 223-244. 

Moise, Gabriela, Zimek, Arthur, Kroger, Peer, Kriegel, Hans-Peter, & Sander, Jorg. (2009). 
Subspace and projected clustering: experimental evaluation and analysis. Knowledge 
and information systems, 21(3), 299-326. 

Moser, Flavia, Colak, Recep, Rafiey, Arash, & Ester, Martin. (2009). Mining cohesive 
patterns from graphs with feature vectors. Chap. 50, pages 593-604. 

Mucha, Peter J, Richardson, Thomas, Macon, Kevin, Porter, Mason A, & Onnela, Jukka- 
Pekka. (2010). Community structure in time-dependent, multiscale, and multiplex 
networks. Science, 328(5980), 876-8. 

Nan Du, Wang, Hao, & Faloutsos, Christos. (2010). Analysis of Large Multi-modal 
Social Networks: Patterns and a Generator. Balcazar, Jose Luis, Bonchi, Francesco, 
Gionis, Aristides, & Sebag, Michele (eds). Machine learning and knowledge discovery 
in databases. Lecture Notes in Computer Science, vol. 6321. Berlin, Heidelberg: 
Springer Berlin Heidelberg. 

Neville, Jennifer, Adler, Micah, & Jensen, David D. (2003). Clustering relational data 
using attribute and link information. Proceedings of the workshop on text mining and 
link analysis, eighteenth international joint conference on artificial intelligence. 

Newman, M. E. J., & Girvan, M. (2004). Finding and evaluating community structure in 
networks. Physical review e, 69(2), 026113+. 

Nicosia, Vincenzo, Mangioni, Giuseppe, Carchiolo, Vincenza, & Malgeri, Michele. (2009). 
Extending the definition of modularity to directed graphs with overlapping communities. 
Journal of statistical mechanics: Theory and experiment, 2009(03), P03024. 

Noh, Jae Dong, & Rieger, Heiko. (2004). Random walks on complex networks. Physical 
review letters, 92(11), 118701. 

Palla, Gergely, Derenyi, Imre, Farkas, Illes, & Vicsek, Tamas. (2005). Uncovering the 
overlapping community structure of complex networks in nature and society. Nature, 
435(7043), 814-818. 

Pei, Jian, Jiang, Daxin, & Zhang, Aidong. (2005). On mining cross-graph quasi-cliques. 
Page 228 of: Proceeding of the eleventh acm sigkdd international conference on 
knowledge discovery in data mining - kdd ’05. New York, USA: ACM Press. 



ZU064-05-FPR article 9 January 2015 


1:22 


38 C. Bothorel, J. D. Cruz, M. Magnani and B. Micenkovd 

Pool, Simon, Bonchi, Francesco, & Leeuwen, Matthijs van. (2014). Description-driven 
community detection. Acm trans. intell. syst. technoi, 5(2), 28:1-28:28. 

Rand, William M. (1971). Objective criteria for the evaluation of clustering methods. 
Journal of the american statistical association, 66(336), pp. 846-850. 

Rissanen, Jorma. (1983). A universal prior for integers and estimation by minimum- 
description lenght. Annals of statistics, 11(2), 416 - 431. 

Rocklin, Matthew, & Pinar, Ali. (2011). On Clustering on Graphs with Multiple Edge 
Types, arxivpreprint. Sept. 

Rossetti, Giulio, Berlingerio, Michele, & Giannotti, Fosca. (2011). Scalable Link 
Prediction on Multidimensional Networks. Pages 979-986 of: 2011 ieee 11th 
international conference on data mining workshops. IEEE. 

Ruan, Yiye, Fuhry, David, & Parthasarathy, Srinivasan. (2013). Efficient community 
detection in large networks using content and links. Pages 1089-1098 of: Proceedings 
of the 22nd international conference on world wide web. WWW ’13. 

Schaeffer, Satu Elisa. (2007). Graph clustering. Computer science review, 1, 27-64. 

Shi, Jianbo, & Malik, Jitendra. (2000). Normalized cuts and image segmentation. Pattern 
analysis and machine intelligence, ieee transactions on, 22(8), 888-905. 

Silva, Arlei, Meira, Jr., Wagner, & Zaki, Mohammed J. (2010). Structural correlation 
pattern mining for large graphs. Pages 119-126 of: Proceedings of the eighth workshop 
on mining and learning with graphs. MLG ’ 10. New York, NY, USA: ACM. 

Skvoretz, John, & Agneessens, Filip. (2007). Reciprocity, multiplexity, and exchange: 
Measures. Quality & quantity, 41(3), 341-357. 

Stein, Benno, & Niggemann, Oliver. (1999). On the nature of structure and its 
identification. Pages 122-134 of: Graph-theoretic concepts in computer science. 
Springer. 

Steinhaeuser, Karsten, & Chawla, NiteshV. (2008). Community detection in a large 
real-world social network. Pages 168-175 of: Liu, Huan, Salerno, JohnJ., & Young, 
MichaelJ. (eds). Social computing, behavioral modeling, and prediction. Springer US. 

Sun, Yizhou, Han, Jiawei, Aggarwal, Charu C., & Chawla, Nitesh V. (2012). When will 
it happen? Page 663 of: Proceedings of the fifth acm international conference on web 
search and data mining - wsdm ’12. New York, USA: ACM Press. 

Tang, Lei, Wang, Xufei, & Liu, Huan. (2011). Community detection via heterogeneous 
interaction analysis. Data mining and knowledge discovery, 25(1), 1-33. 

Tong, Hanghang, Faloutsos, Christos, & Koren, Yehuda. (2007). Fast direction-aware 
proximity for graph mining. Pages 747-756 of: Proceedings of the 13th acm sigkdd 
international conference on knowledge discovery and data mining. KDD ’07. New 
York, NY, USA: ACM. 

van Laarhoven, Twan, & Marchiori, Elena. (2013). An axiomatic study of objective 
functions for graph clustering. Tech. rept. CoRR, abs/1308.3383. 

Villa-Vialaneix, Nathalie, Olteanu, Madalina, & Cierco-Ayrolles, Christine. (2013). Carte 
auto-organisatrice pour graphes etiquetes. Page Article numero 4 of: Atelier Fouilles de 
Grands Graphes (FGG) - EGC’2013. 

Vinh, Nguyen Xuan, Epps, Julien, & Bailey, James. (2010). Information theoretic 
measures for clusterings comparison: Variants, properties, normalization and correction 
for chance. J. mach. learn, res., ll(Dec.), 2837-2854. 



ZU064-05-FPR article 9 January 2015 


1:22 


* 39 

Wang, Bing, Cao, Lang, Suzuki, Hideyuki, & Aihara, Kazuyuki. (2011). Epidemic spread 
in adaptive networks with multitype agents. Journal of physics a: Mathematical and 
theoretical , 44(3), 035101. 

Wang, Jianyong, Zhou, Zhiping, & Lizhu, Zeng;. (2006). CLAN: An Algorithm for Mining 
Closed Cliques from Large Dense Graph Databases. Pages 73-73 of: 22nd international 
conference on data engineering (icde’06). IEEE. 

Wasserman, Stanley, & Faust, Katherine. (1994). Social Network Analysis: Methods and 
Applications. Structural analysis in the social sciences, 8, vol. 8, no. 1. Cambridge 
University Press. 

Xie, Jierui, Kelley, Stephen, & Szymanski, Boleslaw K. (2013). Overlapping community 
detection in networks: The state-of-the-art and comparative study. Acm computing 
surveys (csur), 45(4), 43. 

Xu, Zhiqiang, Ke, Yiping, Wang, Yi, Cheng, Hong, & Cheng, James. (2012). A model- 
based approach to attributed graph clustering. Pages 505-516 of: Proceedings of the 
2012 acm sigmod international conference on management of data. SIGMOD ’ 12. New 
York, NY, USA: ACM. 

Yang, Jaewon, McAuley, Julian, & Leskovec, Jure. 2013 (Dec). Community detection in 
networks with node attributes. Pages 1151-1156 of: Data mining (icdm), 2013 ieee 13th 
international conference on. 

Yang, Shuang-Hong, Long, Bo, Smola, Alex, Sadagopan, Narayanan, Zheng, Zhaohui, & 
Zha, Hongyuan. (2011). Like like alike: joint friendship and interest propagation in 
social networks. Pages 537-546 of: Proceedings of the 20th international conference 
on world wide web, www. ACM. 

Yang, Tianbao, Jin, Rong, Chi, Yun, & Zhu, Shenghuo. (2009). Combining link and content 
for community detection: a discriminative approach. Pages 927-936 of: Kdd ’09: 
Proceedings of the 15th acm sigkdd international conference on knowledge discovery 
and data mining. New York, NY, USA: ACM. 

Yin, Zhijun, Gupta, Manish, Weninger, Tim, & Han, Jiawei. (2010a). Linkrec: a unified 
framework for link recommendation with user attributes and graph structure. Pages 
1211-1212 of: Proceedings of the 19th international conference on world wide web. 
WWW ’ 10. New York, NY, USA: ACM. 

Yin, Zhijun, Gupta, M„ Weninger, T., & Han, Jiawei. (2010b). A unified framework for link 
recommendation using random walks. Pages 152-159 of: Advances in social networks 
analysis and mining (asonam), 2010 international conference on. 

Zhao, Peixiang, Li, Xiaolei, Xin, Dong, & Han, Jiawei. (2011). Graph cube: on 
warehousing and olap multidimensional networks. Pages 853-864 of: Proceedings of 
the 2011 acm sigmod international conference on management of data. ACM. 

Zheleva, Elena, Sharara, Hossam, & Getoor, Lise. (2009). Co-evolution of social and 
affiliation networks. 15th acm sigkdd conference on knowledge discovery and data 
mining (kdd). 

Zhiping Zeng, Jianyong Wang. (2006). Coherent closed quasi-clique discovery from large 
dense graph databases. Proceedings of the 12th acm sigkdd international conference on 
knowledge discovery and data mining - kdd ’06 

Zhou, Yang, Cheng, Hong, & Yu, Jeffrey Xu. (2009). Graph clustering based on 
structural/attribute similarities. Proc. vldb endow., 2(1), 718-729. 



ZU064-05-FPR article 9 January 2015 


1:22 


40 C. Bothorel, J. D. Cruz, M. Magnani and B. Micenkovd 

Zhou, Yang, Cheng, Hong, & Yu, Jeffrey Xu. (2010). Clustering large attributed graphs: 
An efficient incremental approach. Pages 689-698 of: Data mining (icdm), 2010 ieee 
10th international conference on. IEEE. 



